We suggest a new algorithm for two-person zero-sum undiscounted stochasticgames focusing on stationary strategies. Given a positive real $\epsilon$, letus call a stochastic game $\epsilon$-ergodic, if its values from any twoinitial positions differ by at most $\epsilon$. The proposed new algorithmoutputs for every $\epsilon>0$ in finite time either a pair of stationarystrategies for the two players guaranteeing that the values from any initialpositions are within an $\epsilon$-range, or identifies two initial positions$u$ and $v$ and corresponding stationary strategies for the players provingthat the game values starting from $u$ and $v$ are at least $\epsilon/24$apart. In particular, the above result shows that if a stochastic game is$\epsilon$-ergodic, then there are stationary strategies for the playersproving $24\epsilon$-ergodicity. This result strengthens and provides aconstructive version of an existential result by Vrieze (1980) claiming that ifa stochastic game is $0$-ergodic, then there are $\epsilon$-optimal stationarystrategies for every $\epsilon > 0$. The suggested algorithm is based on apotential transformation technique that changes the range of local values atall positions without changing the normal form of the game.
展开▼
机译:我们为集中于固定策略的两人零和无折扣随机游戏提出了一种新算法。给定真实的真实\ epsilon $,如果它的任意两个初始位置的值相差最多$ epsilon,则letus称为随机游戏$ \ epsilon $遍历。提议的新算法在有限时间内每$ \ epsilon> 0 $输出,或者两个参与者的一对固定策略保证任何初始位置的值都在$ \ epsilon $范围内,或者标识两个初始位置$ u $和$ v $和玩家相应的固定策略证明,从$ u $和$ v $开始的游戏价值至少为$ \ epsilon / 24 $。特别地,以上结果表明,如果随机游戏是\\ epsilon $遍历的,那么就有固定的策略让玩家证明$ 24 \ epsilon $遍历。这个结果加强了Vrieze(1980)的存在性结果,并提供了一个建设性的版本。Vrieze认为,如果一个随机博弈是$ 0 $遍历的,那么对于$> epsilon≥$$,就有$ \ epsilon $-最优静态策略。所建议的算法基于电位变换技术,该技术可在所有位置更改局部值的范围而无需更改游戏的正常形式。
展开▼